Image Difference Captioning with Pre-training and Contrastive Learning
نویسندگان
چکیده
The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. major challenges of this lie in aspects: 1) fine-grained that require learning stronger vision and language association 2) high-cost manual annotations leads limited supervised data. To address these challenges, we propose a new modeling framework following pre-training-finetuning paradigm. Specifically, design three self-supervised tasks contrastive strategies align text descriptions at level. Moreover, data expansion strategy utilize extra cross-task supervision information, such as for image classification, alleviate limitation available IDC Extensive experiments on benchmark datasets, CLEVR-Change Birds-to-Words, demonstrate effectiveness proposed framework. codes models will be released https://github.com/yaolinli/IDC.
منابع مشابه
Contrastive Learning for Image Captioning
Image captioning, a popular topic in computer vision, has achieved substantial progress in recent years. However, the distinctiveness of natural descriptions is often overlooked in previous work. It is closely related to the quality of captions, as distinctive captions are more likely to describe images with their unique aspects. In this work, we propose a new learning method, Contrastive Learn...
متن کاملTemporal-difference Learning with Sampling Baseline for Image Captioning
The existing methods for image captioning usually train the language model under the cross entropy loss, which results in the exposure bias and inconsistency of evaluation metric. Recent research has shown these two issues can be well addressed by policy gradient method in reinforcement learning domain attributable to its unique capability of directly optimizing the discrete and non-differentia...
متن کاملDeep Learning for Automatic Image Captioning in Poor Training Conditions
English. Recent advancements in Deep Learning show that the combination of Convolutional Neural Networks and Recurrent Neural Networks enables the definition of very effective methods for the automatic captioning of images. Unfortunately, this straightforward result requires the existence of large-scale corpora and they are not available for many languages. This paper describes a simple methodo...
متن کاملLearning to Evaluate Image Captioning
Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human judgments. Secondly, each metric has well known blind spots to pathological caption constructions, and rulebased metrics lack provisions to repair such blind spots once identified. For example, the newly proposed SPICE correlate...
متن کاملActor-Critic Sequence Training for Image Captioning
Generating natural language descriptions of images is an important capability for a robot or other visual-intelligence driven AI agent that may need to communicate with human users about what it is seeing. Such image captioning methods are typically trained by maximising the likelihood of ground-truth annotated caption given the image. While simple and easy to implement, this approach does not ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i3.20218